The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT
نویسنده
چکیده
The Cambridge Learner Corpus is a 16 million-word corpus of Learner English collected by Cambridge University Press in collaboration with the University of Cambridge Local Examinations Syndicate (now Cambridge ESOL). It comprises English examination scripts, transcribed retaining all errors, written by learners of English with 86 different mother tongues. The scripts range across 8 EFL examinations and cover both general and business English. A 6 million-word component of the corpus has been error coded to date, using an error-coding system devised at CUP specifically for the Cambridge Learner Corpus. The majority of codes are based on a two-letter system in which the first letter represents the general type of error (e.g. wrong form, omission), while the second letter identifies the word class of the required word. There are 88 possible codes in all. This paper will describe the coding system and the corpus tools used for analysis of the coded corpus, and will demonstrate the benefits which this coding and analysis provides for both lexicographers and writers of other ELT books at CUP. 1. Description of the corpus Since 1993, Cambridge University Press, in collaboration with the University of Cambridge Local Examinations Syndicate (now Cambridge ESOL), has compiled a 16 million-word corpus of Learner English. Students' examination essays are carefully transcribed, reproducing all errors, checked for inputtergenerated errors, and stored in the corpus, along with candidate details and examination scores. The corpus is growing all the time. At present, the complete corpus contains more than 16 million words. 86 mother tongues are represented in the corpus, with more than 350,000 words for more than 15 of the mother tongues. The error-coded component of the corpus currently contains 6 million words. A profile of each candidate is given for each examination script. This includes information on the first language, age, sex, education history and years of English study of each student. This information can be used to specify the parameters for the creation of subcorpora. For example, it is possible to isolate for analysis the English of very young learners or a particular examination level, mother tongue or language group. A combination of any of these details can be used to create a subcorpus.
منابع مشابه
Evaluating ELT Materials: A Comparison between Traditional Materials and Mobile Apps
This study attempted to evaluate and compare language learning apps and the related traditional books on the same subject. The apps included Murphy’s English Grammar and Cambridge Discovery Readers and the traditional materials were English Grammar in Use and Developing Reading Skills. The study, thus, aimed to do a comparative analysis between traditional ELT materials and the digital versions...
متن کاملEvaluating ELT Materials: A Comparison between Traditional Materials and Mobile Apps
This study attempted to evaluate and compare language learning apps and the related traditional books on the same subject. The apps included Murphy’s English Grammar and Cambridge Discovery Readers and the traditional materials were English Grammar in Use and Developing Reading Skills. The study, thus, aimed to do a comparative analysis between traditional ELT materials and the digital versions...
متن کاملThe ISLE Corpus: Italian and German Spoken Learners English
Background: ISLE project aims Project ISLE (Interactive Spoken Language Education) aimed to exploit available speech recognition technology to improve the performance of computerbased English language learning systems, specifically for adult German and Italian learners of English. The English language teaching industry is showing increasing interest in and awareness of the relevance and potenti...
متن کاملBuilt-In Learner Participation Potential of Locally- and Globally-Designed ELT Materials
This study aims at empirically measuring a universal criterion for materials evaluation, i.e., learning opportunities, in a locally- and a globally-designed materials. Adopting the conceptual framework of sociocultural theory and its conceptualization of learning as participation (Donato, 2000), the researchers utilized the methodological power of conversation analysis to examine how opportunit...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کامل